{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5分钟用Python生成极品家丁全本小说词云\n",
    "\n",
    "使用如下Python库：\n",
    "1. jieba：中文分词库，自然语言处理必备，https://github.com/fxsjy/jieba\n",
    "2. imageio：读取图片，或者写出图片，http://imageio.github.io\n",
    "3. wordcloud：词云生成，可以设定背景图片，https://github.com/amueller/word_cloud"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import jieba\n",
    "import imageio\n",
    "import wordcloud"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1、将《极品家丁》小说文字分词"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "fin = open('jipinjiading.txt', encoding='utf-8')\n",
    "# 读取txt全部文字\n",
    "txt = fin.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\ufeff《极品家丁》 / 作者：禹岩\\n简介：【本书阅读方法】参考《唐伯虎点秋香》！年轻的销售经理，因为一次意外经历，来到了一个完全不同的世界，成为萧家大宅里一名光荣的——家丁!暮晓春来迟先于百花知岁岁种桃树'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "txt[:100]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Building prefix dict from the default dictionary ...\n",
      "Loading model from cache C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\jieba.cache\n",
      "Loading model cost 0.703 seconds.\n",
      "Prefix dict has been built succesfully.\n"
     ]
    }
   ],
   "source": [
    "# 使用jieba分词，精确模式（还有全模式和搜索引擎模式）\n",
    "wordlist = jieba.lcut(txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "list"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(wordlist)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['\\ufeff', '《', '极品', '家丁', '》', ' ', '/', ' ', '作者', '：']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wordlist[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "string = \" \".join(wordlist)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2、读取图片-用于词云的背景图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "image = imageio.imread(\"python-logo.jpg\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(600, 600, 3)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 该图片是600行、600列的尺寸，每个相似点由3个数字组成\n",
    "image.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align:center\">Python-logo图片</div>\n",
    "<img style=\"width:200px;height:200px\" src=\"python-logo.jpg\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Array([255, 255, 255], dtype=uint8)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 这三个数字组合起来代表一个颜色\n",
    "# 比如255、255、255是白色的\n",
    "image[0][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Array([ 55, 113, 163], dtype=uint8)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 图片顶部，中间的位置\n",
    "image[200][300]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align:center\">像素图片：</div>\n",
    "<div style=\"text-align:center\">在这个网站转换：https://www.sioe.cn/yingyong/yanse-rgb-16/</div>\n",
    "<img src=\"python-logo-pix01.png\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Array([255, 213,  69], dtype=uint8)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 图片底部，中间的位置\n",
    "image[500][200]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align:center\">像素图片：</div>\n",
    "<div style=\"text-align:center\">在这个网站转换：https://www.sioe.cn/yingyong/yanse-rgb-16/</div>\n",
    "<img src=\"python-logo-pix02.png\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3、生成词云图片"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "wc = wordcloud.WordCloud(width=600,\n",
    "                        height=600,\n",
    "                        background_color='white',\n",
    "                        # 指定字体路径，中文需要指定\n",
    "                        font_path='msyh.ttc',\n",
    "                        # mask 指定词云形状图片，默认为矩形\n",
    "                        mask=image,\n",
    "                        # 提高清晰度\n",
    "                        scale=15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<wordcloud.wordcloud.WordCloud at 0x186068755c8>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将string变量传入wc的generate()方法\n",
    "# 给词云输入文字\n",
    "wc.generate(string)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<wordcloud.wordcloud.WordCloud at 0x186068755c8>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将词云图片导出到当前文件夹\n",
    "wc.to_file('result_wordcloud.png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"result_wordcloud.png\" style=\"width:400px;height:400px\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
